Improving Effective Bandwidth through Compiler Enhancement of Global Cache Reuse

نویسندگان

Chen Ding

Ken Kennedy

چکیده

Reusing data in cache is critical to achieving high performance on modern machines, because it reduces the impact of the latency and bandwidth limitations of direct memory access. To date, most studies of software memory hierarchy management have focused on the latency problem in loops. However, today’s machines are increasingly limited by the insufficient memory bandwidth— latency-oriented techniques are inadequate because they do not seek to minimize the amount of data transferred from memory over the whole program. To address the bandwidth limitation, this paper explores the potential for global cache reuse—that is, reusing data across loops nests and over the entire program. In particular, the paper investigates a two-step strategy. The first step fuses computations on the same data to enable the caching of repeated accesses. The second step groups data used by the same computation to make them contiguous in memory. While the first step reduces the frequency of memory access, the second step improves its efficiency. The paper demonstrates the effectiveness of this strategy and shows how to automate it in a production compiler.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Effective Bandwidth through Compiler Enhancement of Global and Dynamic Cache Reuse

While CPU speed has been improved by a factor of 6400 over the past twenty years, memory bandwidth has increased by a factor of only 139 during the same period. Consequently, on modern machines the limited data supply simply cannot keep a CPU busy, and applications often utilize only a few percent of peak CPU performance. The hardware solution, which provides layers of high-bandwidth data cache...

متن کامل

A Beyond Reuse Distance Analysis: Dynamic Analysis for Characterization of Data Locality Potential

Emerging computer architectures will feature drastically decreased flops/byte (ratio of peak processing rate to memory bandwidth) as highlighted by recent studies on Exascale architectural trends. Further, flops are getting cheaper while the energy cost of data movement is increasingly dominant. The understanding and characterization of data locality properties of computations is critical in or...

متن کامل

Cache-Partitioned Tiling for Data Reuse Across Loop Nests

This paper presents cache-partitioned tiling, a systematic and integrated approach for global optimization of cache locality across multiple loop nests which reference multiple arrays. The approach is based on the idea of cache partitioning, in which the cache capacity is divided into a number of equal-sized sections. A data layout in memory is derived to eliminate cache connicts by ensuring th...

متن کامل

Program Transformations for Cache Locality Enhancement on Shared - memory

Program Transformations for Cache Locality Enhancement on Shared-memory Multiprocessors Naraig Manjikian Doctor of Philosophy Graduate Department of Electrical and Computer Engineering University of Toronto 1997 This dissertation proposes and evaluates compiler techniques that enhance cache locality and consequently improve the performance of parallel applications on shared-memory multiprocesso...

متن کامل

Reducing Memory Bandwidth Consumption Via Compiler-Driven Selective Sub-Blocking

As processors continue to deliver higher levels of performance and as memory latency tolerance techniques become widespread to address the increasing cost of accessing memory, memory bandwidth will emerge as a major performance bottleneck. Rather than rely solely on wider and faster memories to address memory bandwidth shortages, an alternative is to use existing memory bandwidth more efficient...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2001

Improving Effective Bandwidth through Compiler Enhancement of Global Cache Reuse

نویسندگان

چکیده

منابع مشابه

Improving Effective Bandwidth through Compiler Enhancement of Global and Dynamic Cache Reuse

A Beyond Reuse Distance Analysis: Dynamic Analysis for Characterization of Data Locality Potential

Cache-Partitioned Tiling for Data Reuse Across Loop Nests

Program Transformations for Cache Locality Enhancement on Shared - memory

Reducing Memory Bandwidth Consumption Via Compiler-Driven Selective Sub-Blocking

عنوان ژورنال:

اشتراک گذاری